nearest neighbor
A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice
In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^\delta$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $\epsilon$-far from being a $k$-NN graph? Here, $\epsilon$-far means that one has to change more than an $\epsilon$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / \epsilon^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $\Omega(\sqrt{n / \epsilon k})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States > Maine > Cumberland County > Standish (0.14)
- North America > United States > California (0.05)
- Asia > India > Rajasthan (0.04)
- (9 more...)
- Health & Medicine (1.00)
- Education (0.93)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- North America > United States > California > Monterey County > Pacific Grove (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > California > Yolo County > Davis (0.14)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Data Science > Data Mining (0.67)
- North America > United States > Indiana (0.04)
- North America > United States > New York > Broome County > Binghamton (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- North America > United States > Massachusetts (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- (14 more...)